A Machine Learning Approach to Clinical Terms Normalization

نویسندگان

  • José Castaño
  • Maria Laura Gambarte
  • Hee Joon Park
  • Maria del Pilar Avila Williams
  • David Pérez-Rey
  • Fernando Campos
  • Daniel R. Luna
  • Sonia E. Benitez
  • Hernán Berinsky
  • Sofía Zanetti
چکیده

We propose a machine learning approach for semantic recognition and normalization of clinical term descriptions. Clinical terms considered here are noisy descriptions in Spanish language written by health care professionals in our electronic health record system. These description terms contain clinical findings, family history, suspected disease, among other categories of concepts. Descriptions are usually very short texts presenting high lexical variability containing synonymy, acronyms, abbreviations and typographical errors. Mapping description terms to normalized descriptions requires medical expertise which makes it difficult to develop a rule-based knowledge engineering approach. In order to build a training dataset we use those descriptions that have been previously matched by terminologists to the hospital thesaurus database. We generate a set of feature vectors based on pairs of descriptions involving their individual and joint characteristics. We propose an unsupervised learning approach to discover term equivalence classes including synonyms, abbreviations, acronyms and frequent typographical errors. We evaluate different combinations of features to train MaxEnt and XGBoost models. Our system achieves an F1 score of 89% on the Hospital Italiano de Buenos Aires (HIBA) problem list.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach Based on Higher Order Spectra for Clinical Recognition of Seizure and Epilepsy Using Brain Activity

Introduction: This paper proposes a reliable and efficient technique to recognize different epilepsy states, including healthy, interictal, and ictal states, using Electroencephalogram (EEG) signals. Methods: The proposed approach consists of pre-processing, feature extraction by higher order spectra, feature normalization, feature selection by genetic algorithm and ranking method, and classif...

متن کامل

Machine Learning Models for Housing Prices Forecasting using Registration Data

This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...

متن کامل

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with Dnorm

We describe an application of DNorm – a mathematically principled and high performing methodology for disease recognition and normalization, even in the presence of term variation – to clinical notes. DNorm consists of a text processing pipeline, including the BANNER named entity recognizer to locate diseases in the text, and a novel machine learning approach based on pairwise learning to rank ...

متن کامل

Transparent Machine Learning Algorithm Offers Useful Prediction Method for Natural Gas Density

Machine-learning algorithms aid predictions for complex systems with multiple influencing variables. However, many neural-network related algorithms behave as black boxes in terms of revealing how the prediction of each data record is performed. This drawback limits their ability to provide detailed insights concerning the workings of the underlying system, or to relate predictions to specific ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016